{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9eb7df83",
   "metadata": {},
   "source": [
    "# Human Pose Inference using Onnx runtime \n",
    "\n",
    "In this example notebook, we describe how to take a pre-trained human pose estimation model for inference using the ***ONNX Runtime*** interface.\n",
    "   - The user can choose the model (see section titled *Choosing a Pre-Compiled Model*)\n",
    "   - The models used in this example were trained on the ***COCO*** dataset because it is a widely used dataset developed for training and benchmarking human pose estimation AI models. \n",
    "   - We perform inference on a few sample images which can be chosen from a drop down list.\n",
    "   - We also describe the input preprocessing and output postprocessing steps, demonstrate how to collect various benchmarking statistics and how to visualize the data.\n",
    "\n",
    "## Choosing a Pre-Compiled Model\n",
    "We provide a set of precompiled artifacts to use with this notebook that will appear as a drop-down list once the second code cell is executed.\n",
    "\n",
    "<img src=docs/images/drop_down.PNG width=\"400\">\n",
    " \n",
    "    \n",
    "## Onnx Runtime based work flow\n",
    "\n",
    "The diagram below describes the steps for Onnx Runtime based work flow. \n",
    "\n",
    "Note:\n",
    "- The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.\n",
    "    - For this notebook we use pre-compiled models artifacts\n",
    "- The generated artifacts can then be used to run inference on the target.\n",
    "- Users can run this notebook as-is, only actions required are to select a model an the sample image for inference.\n",
    "\n",
    "<img src=docs/images/onnx_work_flow_2.png width=\"400\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83d3cd29",
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import cv2\n",
    "import sys\n",
    "import numpy as np\n",
    "import onnxruntime as rt\n",
    "import ipywidgets as widgets\n",
    "from scripts.utils import get_eval_configs\n",
    "\n",
    "last_artifacts_id = selected_model_id.value if \"selected_model_id\" in locals() else None\n",
    "prebuilt_configs, selected_model_id = get_eval_configs('human_pose_estimation','onnxrt', num_quant_bits = 8, last_artifacts_id = last_artifacts_id)\n",
    "display(selected_model_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d460c08b",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'Selected Model: {selected_model_id.label}')\n",
    "config = prebuilt_configs[selected_model_id.value]\n",
    "config['session'].set_param('model_id', selected_model_id.value)\n",
    "config['session'].start()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb302b3f",
   "metadata": {},
   "source": [
    "## Define utility function to preprocess input images\n",
    "Below, we define a utility function to preprocess images for `human pose estimation`. This function takes a path as input, loads the image and preprocesses it for generic ***Onnx*** inference. The steps are as follows: \n",
    "\n",
    " 1. load image\n",
    " 2. convert BGR image to RGB\n",
    " 3. scale image so that the longer edge is 512 pixels\n",
    " 4. pad the smaller edge with (127,127,127) to make it to 512 pixels\n",
    " 5. apply per-channel pixel scaling and mean subtraction\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cacf837e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_for_onnx_pose_estimation(image_path, size, mean, scale, layout, reverse_channels, pad_color=114, pad_type=\"center\"):\n",
    "    # Step 1\n",
    "    # read the image using openCVimport json_tricks as json\n",
    "    img = cv2.imread(image_path)\n",
    "    \n",
    "    # Step 2\n",
    "    # convert to RGB\n",
    "    img = img[:,:,::-1]\n",
    "    \n",
    "    # Step 3    \n",
    "    # Most of the onnx models are trained using\n",
    "    # 512x512 images. The general rule of thumb\n",
    "    # is to scale the input image while preserving\n",
    "    # the original aspect ratio so that the\n",
    "    # longer edge is 512 pixels, and then\n",
    "    # pad the scaled image to 512x512\n",
    "    \n",
    "    size = (size,size) if not isinstance(size, (list,tuple)) else size\n",
    "    desired_size = size[-1]\n",
    "    old_size = img.shape[:2] # old_size is in (height, width) format\n",
    "\n",
    "    ratio = float(desired_size)/max(old_size)\n",
    "    new_size = tuple([int(x*ratio) for x in old_size])\n",
    "\n",
    "    # new_size should be in (width, height) format\n",
    "    img = cv2.resize(img, (new_size[1], new_size[0]))\n",
    "\n",
    "    delta_w = size[1] - new_size[1]\n",
    "    delta_h = size[0] - new_size[0]\n",
    "\n",
    "    if pad_type==\"corner\":\n",
    "        top, left = 0, 0\n",
    "        bottom, right = delta_h, delta_w\n",
    "    else:\n",
    "        delta_w = size[1] - new_size[1]\n",
    "        delta_h = size[0] - new_size[0]\n",
    "        top, bottom = delta_h//2, delta_h-(delta_h//2)\n",
    "        left, right = delta_w//2, delta_w-(delta_w//2)\n",
    "\n",
    "\n",
    "    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT,\n",
    "        value=pad_color)\n",
    "    \n",
    "    # Step 4\n",
    "    # Apply scaling and mean subtraction.\n",
    "    # if your model is built with an input\n",
    "    # normalization layer, then you might\n",
    "    # need to skip this\n",
    "    if mean is not None and scale is not None:\n",
    "        img = img.astype('float32')\n",
    "        for mean, scale, ch in zip(mean, scale, range(img.shape[2])):\n",
    "            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)\n",
    "            \n",
    "    # Step 5\n",
    "    if reverse_channels:\n",
    "        img = img[:,:,::-1]\n",
    "    \n",
    "    # Step 6\n",
    "    img = np.expand_dims(img,axis=0)\n",
    "    img = np.transpose(img, (0, 3, 1, 2))\n",
    "    \n",
    "    return img, top, left, ratio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20a8d733",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.utils import get_preproc_props\n",
    "\n",
    "size, mean, scale, layout, reverse_channels = get_preproc_props(config)\n",
    "print(f'Image size: {size}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c2a3b84",
   "metadata": {},
   "source": [
    "## Create the model using the stored artifacts\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Warning:</b> It is recommended to use the ONNX Runtime APIs in the cells below without any modifications.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee664bc9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import onnxruntime as rt\n",
    "\n",
    "onnx_model_path = config['session'].get_param('model_file')\n",
    "delegate_options = {}\n",
    "so = rt.SessionOptions()\n",
    "delegate_options['artifacts_folder'] = config['session'].get_param('artifacts_folder')\n",
    "\n",
    "EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']\n",
    "sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)\n",
    "\n",
    "input_details = sess.get_inputs()\n",
    "output_details = sess.get_outputs()\n",
    "\n",
    "print('onnx_model_path:',onnx_model_path)\n",
    "print('artifacts_folder:',config['session'].get_param('artifacts_folder'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cf06061",
   "metadata": {},
   "source": [
    "## Run the model for inference\n",
    "\n",
    "### Preprocessing and Inference\n",
    "\n",
    "   - You can use a portion of images provided in `/sample-images/pose` directory to evaluate the classification inferences. In the cell below, we use a drop down list to select the image and preprocess it and finally provide it as the input to the network.\n",
    "\n",
    "### Postprocessing and Visualization\n",
    "\n",
    " - Once the inference results are available, we postpocess the results and visualize the pose estimation for the input image.\n",
    " - Pose Estimation models return the results as a list of `numpy.ndarray`, containing one element which is an array with `shape` = `(34,128,128)` and `dtype` = `'float32'`, where the first 17 channels represents the heatmaps foir keypoints and the last 17 channels represent the tag values for each keypoint. The results from the these inferences above are postprocessed using `single_img_visualise` function described in `utils` to get the output image with pose drawn on it.\n",
    " - Then, in this notebook, we use *matplotlib* to plot the final image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdcd2ef3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# choose the image to be used for inference from the drop down list\n",
    "img_name = widgets.Dropdown(\n",
    "    options=[('horseman.jpg'), ('ski.jpg'), ('ski_jump.jpg'),('street_walk_2.jpg'),('street_walk_3.jpg')],\n",
    "    value='ski_jump.jpg',\n",
    "    description='Inference_Image:',\n",
    ")\n",
    "display(img_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53c95af8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.utils import get_preproc_props\n",
    "# read, preprocess, forward pass on a single chosen image  \n",
    "label=selected_model_id.label\n",
    "pad_color = 128 if 'ae' in label and 'yolo' not in label else 114\n",
    "pad_type = \"corner\" if 'yolox' in label else \"center\"\n",
    "image_name = os.path.join('sample-images','pose',img_name.value)\n",
    "processed_image, top, left, ratio = preprocess_for_onnx_pose_estimation(image_name, size, mean, scale, layout, reverse_channels, pad_color, pad_type)\n",
    "#size, mean, scale, layout, reverse_channels = get_preproc_props(config)    \n",
    "#print(f'Image size: {size}')\n",
    "\n",
    "#processed_image = preprocess(image_name , size, mean, scale, layout, reverse_channels)\n",
    "\n",
    "if not input_details[0].type == 'tensor(float)':\n",
    "    processed_image = np.uint8(processed_image)\n",
    "image_size = processed_image.shape[3]    \n",
    "out_file=None\n",
    "output=None\n",
    "output = list(sess.run(None, {input_details[0].name : processed_image}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bfc37c3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "#postprocessing on a single image\n",
    "from scripts.utils import single_img_visualise\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "output_image = single_img_visualise(output,image_size,image_name,out_file, top, left, ratio, udp=True, thickness=2, radius=5,label=label)\n",
    "# plot the outut using matplotlib\n",
    "plt.rcParams[\"figure.figsize\"]=20,20\n",
    "plt.rcParams['figure.dpi'] = 200 # 200 e.g. is really fine, but slower\n",
    "plt.imshow(output_image)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15e1109d",
   "metadata": {},
   "source": [
    "## Plot Inference benchmarking statistics\n",
    "\n",
    " - During the model execution several benchmarking statistics such as timestamps at different checkpoints, DDR bandwidth are collected and stored. `get_TI_benchmark_data()` can be used to collect these statistics. This function returns a dictionary of `annotations` and the corresponding markers.\n",
    " - We provide the utility function plot_TI_benchmark_data to visualize these benchmark KPIs\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>Note:</b> The values represented by <i>Inferences Per Second</i> and <i>Inference Time Per Image</i> uses the total time taken by the inference except the time taken for copying inputs and outputs. In a performance oriented system, these operations can be bypassed by writing the data directly into shared memory and performing on-the-fly input / output normalization.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4905f1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "sys.path.append(\"..\") \n",
    "from scripts import utils\n",
    "from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output\n",
    "stats = sess.get_TI_benchmark_data()\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))\n",
    "plot_TI_performance_data(stats, axis=ax)\n",
    "plt.show()\n",
    "\n",
    "tt, st, rb, wb = get_benchmark_output(stats)\n",
    "print(f'Statistics : \\n Inferences Per Second   : {1000.0/tt :7.2f} fps')\n",
    "print(f' Inference Time Per Image : {tt :7.2f} ms  \\n DDR BW Per Image        : {rb+ wb : 7.2f} MB')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cbd39303",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
